[WIP] Update Dsv4 B300 configs by wzhao18 · Pull Request #1656 · SemiAnalysisAI/InferenceX

wzhao18 · 2026-06-03T17:26:26Z

Note

Low Risk
Benchmark and runner config only; no auth, serving API, or production inference path changes.

Overview
Switches the dsv4-fp4-b300-vllm recipe to the nvidia/DeepSeek-V4-Pro-NVFP4 checkpoint instead of the upstream deepseek-ai/DeepSeek-V4-Pro weights, and registers DeepSeek-V4-Pro-NVFP4 in the B300 runner’s pre-staged model list so jobs can load from scratch without a fresh HF pull.

The single-node benchmark script no longer passes --moe-backend deep_gemm_mega_moe when data-parallel attention is enabled—that block is commented out, so DP layouts run with default MoE backend behavior for this NVFP4 setup.

^{Reviewed by Cursor Bugbot for commit 41630f2. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-06-03T17:26:37Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-06-03T17:29:44Z

+- config-keys:
+    - dsv4-fp4-b300-vllm
+  description:
+    - "Update B300 dsv4 image to nvfp4"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1652


🔴 Two issues in the new perf-changelog entry for dsv4-fp4-b300-vllm: (1) The pr-link references #1652, but that's the MiniMax-M2.5 PR (the entry immediately above) — this is a copy-paste, the link should point to this PR (#1656). (2) The description says "Update B300 dsv4 image to nvfp4", but the diff only changes the model field; the image (vllm/vllm-openai:v0.22.0) is unchanged, so it should say something like "Update B300 dsv4 model to NVFP4 checkpoint".

Extended reasoning...

Bug 1: Wrong PR link

The new changelog entry at line 3438 sets pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1652. However, PR #1652 is [B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs (already merged at commit 7d4063d) — a completely different PR. The current PR is #1656 ("[WIP] Update Dsv4 B300 configs"). The entry immediately above (line 3432, for minimaxm2.5-fp4-b300-dynamo-vllm) correctly references #1652, so this is almost certainly a copy-paste error from the adjacent entry.

Bug 2: Misleading description ("image" vs "model")

The description says "Update B300 dsv4 image to nvfp4". But the diff in .github/configs/nvidia-master.yaml only changes the model field:

- model: deepseek-ai/DeepSeek-V4-Pro + model: nvidia/DeepSeek-V4-Pro-NVFP4

The image field (vllm/vllm-openai:v0.22.0) is unchanged. In perf-changelog.yaml "image" consistently refers to the Docker container image — e.g. line 3430 of the same file says "Image: vllm/vllm-openai:v0.20.1". The PR's own Cursor Bugbot summary even acknowledges this: "Image, search space, and other recipe fields are unchanged." Calling the model swap an "image" change is misleading.

Step-by-step proof

The PR metadata shows this is PR [WIP] Update Dsv4 B300 configs #1656, titled "[WIP] Update Dsv4 B300 configs".

Recent commit history (7d4063d) confirms PR [B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs #1652 is a separate, already-merged PR titled "[B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs".

In perf-changelog.yaml, the entry at line 3432 (the MiniMax-M2.5 entry) legitimately uses pr-link: .../pull/1652.

The newly added entry at line 3438 (this PR's entry) uses the same pr-link: .../pull/1652 — duplicating the link from the line above instead of referencing [WIP] Update Dsv4 B300 configs #1656.

Reading the diff in .github/configs/nvidia-master.yaml, only model: changes; image: stays at vllm/vllm-openai:v0.22.0. So "Update B300 dsv4 image to nvfp4" describes a change that did not happen, and omits the change that did.

Suggested fix

- config-keys: - dsv4-fp4-b300-vllm description: - "Update B300 dsv4 model to NVFP4 checkpoint (nvidia/DeepSeek-V4-Pro-NVFP4)" pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1656

Impact

Documentation-only — no runtime behavior is affected. But the changelog is the canonical record for tracing perf changes back to their PR, and the incorrect link points readers to an unrelated PR with no context for this model swap.

github-actions · 2026-06-03T17:45:30Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26901571751
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26901571751

github-actions · 2026-06-04T17:06:58Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26901571751
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26901571751

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 41630f2. Configure here.}

cursor · 2026-06-04T19:02:22Z

-fi
+# if [ "${DP_ATTENTION}" = "true" ]; then
+#     MOE_ARGS=(--moe-backend deep_gemm_mega_moe)
+# fi


DP-attn megamoe backend disabled

Medium Severity

With DP_ATTENTION=true, the script no longer passes --moe-backend deep_gemm_mega_moe, but dsv4-fp4-b300-vllm still schedules high-concurrency dp-attn/ep points. That diverges from the prior B300 pareto recipe and from dsv4_fp4_b300_vllm_mtp.sh / B200 vLLM siblings, so those runs may not match the intended serving path.

^{Reviewed by Cursor Bugbot for commit 41630f2. Configure here.}

github-actions · 2026-06-04T22:40:23Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26973148843
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26973148843

claude · 2026-06-08T20:11:03Z

Claude finished @wzhao18's task in 2m 7s —— View job

Review of PR #1656

Gather context and read changed files
Validate master config changes
Validate perf-changelog entry
Validate benchmark script
Validate runner launch script
Post review summary

Summary

Config and runner changes look correct — the model swap to nvidia/DeepSeek-V4-Pro-NVFP4, the pre-staged model list addition, and the --gpu-memory-utilization 0.97 bump are all straightforward.

One issue found:

🟡 perf-changelog.yaml: Description says "Update B300 dsv4 image to nvfp4" but the image (vllm/vllm-openai:v0.22.0) is unchanged — only the model checkpoint changed. This was flagged in the earlier review (the PR-link was corrected to [WIP] Update Dsv4 B300 configs #1656 but the description was not updated). The description also omits the --gpu-memory-utilization 0.97 addition and the MoE backend change. See inline comment with a suggested fix.

No blocking issues found. The rest of the changes pass validation: public Docker Hub image, correct EP conditional pattern, server launch formatted with line continuations, perf-changelog appended at end with correct PR link, and valid model-prefix.

claude · 2026-06-08T20:13:08Z

+- config-keys:
+    - dsv4-fp4-b300-vllm
+  description:
+    - "Update B300 dsv4 image to nvfp4"


🟡 WARNING: Misleading description — says "image" but only the model field changed

Why it matters: The perf-changelog is the canonical record for tracing config changes. "image" in this file consistently refers to the Docker container image (e.g., vllm/vllm-openai:v0.22.0), which is unchanged here. The actual change is the model checkpoint swap to nvidia/DeepSeek-V4-Pro-NVFP4, plus the --gpu-memory-utilization 0.97 addition and MoE backend change. This was flagged in the earlier review (the PR-link was fixed but the description was not).

Fix:

Suggested change

- "Update B300 dsv4 image to nvfp4"

- "Switch B300 dsv4 model to nvidia/DeepSeek-V4-Pro-NVFP4 checkpoint; bump gpu-memory-utilization to 0.97; disable deep_gemm_mega_moe backend for dp-attn"

github-actions · 2026-06-08T22:02:58Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27163993583
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27163993583

Use nvfp4 model

ee16194

wzhao18 requested a review from a team June 3, 2026 17:26

wzhao18 requested review from jgangani and kedarpotdar-nv as code owners June 3, 2026 17:26

github-project-automation Bot added this to InferenceMAX Board Jun 3, 2026

Update PR number

e8ecb38

wzhao18 added the full-sweep-enabled label Jun 3, 2026

claude Bot reviewed Jun 3, 2026

View reviewed changes

wzhao18 and others added 2 commits June 4, 2026 12:00

Stage dsv4 b300

8df7b13

Merge branch 'main' into wzhao/dsv4-b300-nvfp4

41630f2

cursor Bot reviewed Jun 4, 2026

View reviewed changes

wzhao18 marked this pull request as draft June 5, 2026 16:06

wzhao18 added 2 commits June 8, 2026 16:09

Adjust GPU memory utilization parameter

bb079db

Merge branch 'main' into wzhao/dsv4-b300-nvfp4

170000c

wzhao18 marked this pull request as ready for review June 8, 2026 20:10

claude Bot reviewed Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Update Dsv4 B300 configs#1656

[WIP] Update Dsv4 B300 configs#1656
wzhao18 wants to merge 6 commits into
mainfrom
wzhao/dsv4-b300-nvfp4

wzhao18 commented Jun 3, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

claude Bot Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

claude Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

claude Bot Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	- "Update B300 dsv4 image to nvfp4"
	- "Switch B300 dsv4 model to nvidia/DeepSeek-V4-Pro-NVFP4 checkpoint; bump gpu-memory-utilization to 0.97; disable deep_gemm_mega_moe backend for dp-attn"

Conversation

wzhao18 commented Jun 3, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

claude Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 4, 2026

Choose a reason for hiding this comment

DP-attn megamoe backend disabled

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

claude Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #1656

Summary

Uh oh!

claude Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wzhao18 commented Jun 3, 2026 •

edited by cursor Bot

Loading

claude Bot commented Jun 8, 2026 •

edited

Loading